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Abstract 

We propose a nonlinear filtering framework for approaching the problems 
of channel state tracking and spatlotemporal channel gain prediction in mobile 
wireless sensor networks, in a Bayesian setting. We assume that the wireless 
channel constitutes an observable (by the sensors/network nodes), spatiotempo- 
ral, conditionally Gaussian stochastic process, which is statistically dependent 
on a set of hidden channel parameters, called the channel state. The channel 
state evolves in time according to a known, non stationary, nonlinear and/or 
non Gaussian Markov stochastic kernel. This formulation results in a partially 
observable system, with a temporally varying global state and spatiotemporally 
varying observations. Recognizing the intractability of general nonlinear state 
estimation, we advocate the use of grid based approximate filters as an effective 
and robust means for recursive tracking of the channel state. We also propose a 
sequential spatiotemporal predictor for tracking the channel gains at any point 
in time and space, providing real time sequential estimates for the respective 
channel gain map, for each sensor in the network. Additionally, we show that 
both estimators converge towards the true respective MMSE optimal estimators, 
in a common, relatively strong sense. Numerical simulations corroborate the 
practical effectiveness of the proposed approach. 

Index Terms 

Mobile Wireless Sensor Networks, Channel State Estimation, Spatiotemporal Channel Pre¬ 
diction, Nonlinear Filtering, Sequential Estimation, Markov Processes. 


I. Introduction 

A S a result of the growing Interest in wireless networks and distributed communication 
and processing systems, new, challenging problems have recently transpired, related not 
only to the flow of information over networks, but also to the estimation and control of the 
underlying physical layer. In a large number of important applications, accurate estimation 
of Channel State Information (CSI), or Statistical CSI (SCSI), for all of the nodes/sensors in a 
wireless network is essential. Popular examples include distributed collaborative beamforming 
and related Space Division Multiple Access (SDMA) techniques, target detection and estimation 
in distributed networked radar systems, and information theoretic physical layer security via 
transmission optimization, just to name a few M- 
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Traditionally, in such applications, CSI and SCSI estimation is done via pilot based schemes, 
blind channel estimation techniques, or even through averaging from rough channel obser¬ 
vations. Except for the fact that naive extension of the conventional techniques to larger 
scale wireless networks requires collaboration between the network nodes, which can be 
both bandwidth and power intensive, these techniques are only sufficient for relatively lower 
rate and/or quasistatic environments, where the statistics of the communication medium 
do not change significantly over time. However, the behavior of most indoor and outdoor 
communication environments of practical interest is intrinsically time varying (see, e.g., 0). 

In addition to the temporal variation of the wireless medium, recently, considerable in¬ 
terest has been expressed concerning its spatial variation as well. In fact, learning how the 
communication channel evolves through space is tightly connected to the ability of a net¬ 
work to assess the quality of the channel at previously unexplored locations in the space, 
based on local channel measurements at the respective sensors and by exploiting spatial 
statistical correlations among them. Such knowledge would be beneficial in a number of new 
and important applications. Examples include mobile beamforming |8j, mobility enhanced 
physical layer security, MU* communication-aware motion and path planning, network 
routing, connectivity maintenance and physical layer based dynamic coverage |12|-|14|. In 
all these cases, dynamic spatiotemporal channel estimation/tracking and prediction becomes 
an essential part of mobility control, since it would provide valuable physical layer related 
information (channel maps), which is absolutely necessary for dynamic decision making and 
stochastic control. 

Regarding the explicit use of the idea of parameter tracking in channel estimation, important 
work has been done on identification/characterization of multipath wireless channels. For 
example, in |15| , a sparse variational Bayesian extension of the popular SAGE algorithm |16| 
was developed, aiming to high resolution parameter estimation of the multipath components 
of spatially and frequency selective wireless channels. In the problems of detection, 

estimation and tracking of MIMO radio propagation parameters were considered, where an 
efficient state space approach was developed, based on the proper use of the Extended Kalman 
Filter. A similar problem was also considered in ) 18] , where a specially designed estimation 
algorithm was proposed, based on particle filtering. 

To the best of the authors’ knowledge, the first basic approach to joint spatiotemporal 
channel (specifically shadowing) tracking and prediction was recently presented in [19) , |20| , 
where the use of Channel Gain (CG) maps was advocated as an advantageous alternative to 
Power Spectral Density (PSD) maps for cooperative spectrum sensing in the context of cognitive 
radios. The overall formulation of the problem presented in [19], 120 is based on a direct 
fusion of previously proposed results in wireless channel modeling |2 1] and spatiotemporal 
Kalman filtering 122 , also known in the literature as Kriged Kalman Filtering (KKF) 1231, 124 . 
Although analytically appealing, the state space model considered in [19) , [20) for describing 
the spatiotemporal evolution of the wireless channel is rather restrictive; both the dependence 
of the shadowing field on its previous value in time and its spatial interactions are characterized 
by purely linear functional relationships, focusing mainly on modeling the spatiotemporal 
variations of the trend of the field. 

In this work, in order to facilitate conceptualization, we consider a simple network configu¬ 
ration, comprised by a “reference” point/antenna capable of broadcasting global information 
in the space, as well as a set of possibly mobile network nodes/sensors, capable either of local 
message exchange, or communicating with a fusion center (see Fig. |T|. For concreteness, a 
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channel learning scenario is considered where, at each time instant, the reference antenna 
broadcasts a generic signal to all the sensors at the same time, which in turn use the acquired 
measurements in order to learn the basic characteristics of the channel, and subsequently 
make consistent predictions regarding its quality for any point in time and space. The statistical 
model describing the joint spatiotemporal behavior of the channel measurements gathered at 
the sensors is inspired by |25| . However, different from |25| , the descriptive channel parameters 
(e.g., the path loss exponent, the shadowing power, etc.), referred to here as the channel state, 
are assumed to be temporally varying. Specifically, we assume that the whole channel state 
constitutes a Markov process, with known, but potentially non stationary, nonlinear and./or non 
Gaussian transition model. Essentially, under this formulation, the spatiotemporal evolution 
of the channel is conveniently modeled as a general two layer stochastic system, or, in more 
specific terms, as a partially observable dynamical system (with Markovian dynamics), more 
commonly referred to as a Hidden Markov Model (HMM) 126 . 

The proposed formulation can naturally lead to a full blown state space channel description 
in terms of generality. Compared to |19) , it is more general, since it can deal with complex 
variations in the channel characteristics, other than linear variations in the shadowing trend. 
However, in our state space description of the channel, spatial statistical dependencies are 
present only in the observations process, whereas in 1191, the trend of the shadowing com¬ 
ponent of the channel, constituting the hidden state, respectively, is jointly spatiotemporally 
colored. Also, here, we will consider the detrended problem, similar to the one treated in |25| (in 
a non Bayesian framework) and which has proven to be in good agreement with reality as well. 
A complete channel model, combining both a non zero spatiotemporally varying shadowing 
trend in the fashion of 119], |20| with temporally varying channel parameters advocated here, 
results in a non trivial problem in nonlinear estimation and constitutes a subject of future 
research. 

Our main contributions are clear and summarized in the following. 1) Recognizing the 
obvious intractability of state estimation in partially observable nonlinear systems, we propose 
the use of grid based approximate nonlinear recursive filters for sequential channel state 
tracking. Due to the relatively small dimension of the channel state, grid based methods 
constitute excellent approximation candidates for the problems at hand. Then, exploiting 
filtered estimates of the channel state, a recursive spatiotemporal predictor of the channel 
gains (magnitudes) is developed, providing real time sequential estimates for the respective 
CG map, for each sensor in the network. 2 ) We provide a set of simple, relaxed conditions, 
under which the proposed channel state tracker briefly described above is asymptotically 
optimal, in the sense that it converges to the respective true MMSE channel state estimator, 
in a relatively strong sense. The convergence of the proposed spatiotemporal predictor is 
established in exactly the same sense, providing a unified convergence criterion for both 
sequential estimators. 

The results presented in this paper essentially show that grid based approximate nonlinear 
filtering is meaningfully applicable to the channel state tracking and spatiotemporal chan¬ 
nel prediction problems of interest. As we will see, this is possible by approximating the 
complex nonlinearly varying processes modeling the evolution of the channel parameters by 
appropriately designed Markov chains with finite state spaces. And in the other way around, 
the asymptotic optimality properties of the proposed approach clearly justify the use of such 
Markov chains as an excellent approximation choice for the highly nonlinear problems at hand. 

The paper is organized as follows. In Section II, we present a detailed formulation of the 
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Figure 1: The wireless network of interest. 

problems of interest, along with some mild technical assumptions on the structure of the HMM 
channel description under consideration. In Section III, we present a number of essential 
results on asymptotically optimal, grid based recursive filtering and prediction of hidden 
Markov processes. Section IV is devoted to the development of the proposed channel state 
tracking and spatiotemporal channel prediction schemes, along with a complete theoretical 
justification, including the presentation of new asymptotic results. In Section V, representative 
numerical simulations are presented, corroborating the practical elfectiveness of the proposed 
approach. Finally, Section VI concludes the paper. 

II. System Model & Problem Formulation 

For simplicity, we consider a wireless network of typical form, a high level illustration of 
which is shown in Fig. [ 7 ] The environment is assumed to be a closed planar region 5cK 2 , 
where, as already stated above, there exists a fixed, stationary antenna at a reference position, 
capable of at least information broadcasting. There also exist a set of N single antenna sensors, 
possibly mobile, monitoring the channel relative to the reference antenna. These sensors may 
be a subset of the total nodes in the network and are responsible for the respective channel 
estimation tasks. The sensors can cooperate, and further, can either communicate with a 
fusion center (in a centralized setting), or exchange basic messages amongst each other (in 
a decentralized/infrastructureless scenario) using a low rate dedicated channel. Concerning 
channel modeling, we adopt a flat fading model between each node and the reference antenna. 
It is additionally assumed that channel reciprocity holds and that all network nodes can 
perfectly observe their individual channel realizations (e.g. magnitudes and potentially phases) 
relative to the reference antenna |25| . The channels are modeled as spatially and temporally 
correlated, discrete time random processes (spatiotemporal random fields), sharing the same 
channel environment, at least as far as the underlying characteristics of the communication 
medium are concerned. 

As already mentioned, the channel state encompasses statistics of the communication 
medium, and is here modeled as a multidimensional discrete time stochastic process, evolving 
in time according to a known statistical model. The channel state is assumed to be hidden 
from the network nodes; the nodes can observe their respective channel realizations, but they 
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cannot directly observe the characteristics of the mechanism that generates these realizations. 
Naturally, this stochastic structure gives rise to the description of the channel state and the 
associated observation proeess(es) as a partially observable stochastic dynamical system. 

Let us describe our problem in more explicit mathematical terms. All stochastic processes 
are defined on a common probability triplet ($},&,V). Let X t = X t (w) C R Mxl , t € N, w £ Q 
denote the hidden channel state. Under the flat fading assumption, the relative to the reference 
antenna complex channel process at each network node i £ N^, located at p ( = p, (t) £ S 
(that is, the nodes might be moving), can be decomposed as |27 


C 9 r, ( Pi (i), X t ) = Y (X t ) = Y FL (X t ) Y? h (X t ) Y™ f (t) exp 


2TTdi (t)' 


( 1 ) 


fading 


where 3 — \/ -1, A e K ++ denotes the wavelength employed for the communication and 
where: 1) Y FL (X t ) £ K denotes path loss, defined as Y FL (X t ) = ||p,j (t) — p re /|| 2 ^ Xt ^ 1 2 A 
(, di (f)) _/i(X ‘ )/2 , where p(X t ) £ R + . is the state dependent path loss exponent, which is the 
same for all network nodes and p re t £ S denotes the position of the reference antenna in S. 2 ) 
Y.j (X t ) £ R denotes the shadowing part of the channel model and its square, conditionally on 
X t , constitutes a base-10 log-normal random variable with zero location and scale depending 
on X t . 3) Yp W F (t) £ C represents multipath fading, which, for simplicity, is assumed to be 
a spatiotemporally whiuQ strictly stationary process with fully known statistical description, 
not associated with X t , therefore being an unpredictable complex “observation noise”. Making 
the substitution Y i (p, (f) , X t ) £- exp (—32 Trd i (t) /X)Y i (p, ; (t) , X t ) and using properties of the 
complex logarithm, we can define the observations of the *-th node in logarithmic scale as 


Vt = 10log 10 |F p . (X t )\~ - 10E |log 10 


Ym F C t ) 


- 10p (X t ) log 10 (di (t)) + 10 log 10 (Yf H (X t )^ + 


10 log 


10 


Y™ F (t) ±aU(X t ) + *i(X t )+£l (2) 


where (•) denotes the zero mean version of a random variable. We should emphasize here 
that by “measurement” or “observation” we refer to the predictable component of the channel, 
which is described in terms of the channel magnitude. 

Af(o,R), 


In similar fashion as in 


25 


, the following further assumptions are made 


Vz £ and Vt £ N. This is a simplified, although quite reasonable assumption. Also see 


Second, conditional on X, 


fa™ (x^y 


, < ri(X t ) i ~-Af(0,y 2 (.X,)) ,Vi 


28 


£ NJj. This stems from the fact that 


is (base-10) log-normally distributed. Additionally, for each set of positions of 


the network nodes in S, it is assumed that the members of the set 


U(X t )} 4 

1 J iewt 


constitute 


1 See [25| and references therein for arguing about the validity of this assumption. Also, throughout the paper, the 
samples of a discrete time white stochastic process are understood to be independent. 

2 

In what follows, “i.d.” means “identically distributed” and “i.i.d.” means “independent and i.d.”. 
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jointly normal, spatially correlated random variables with (symmetric and positive definite in 
(pj (t ), pj (t))) conditional on X t autocorrelation kernel 

1Z (pj ( t ), Pj (f), 9 (X t )) : (sxSx M a ') >->■ K, (3) 

for (i, j) £ x N^r, where K denotes the dimension of the state dependent parameter vector 
9 (X t ), with 

fc(Pi(t),Pi(t),9(X t ))=r] 2 (X t ), VieNjj. (4) 


Therefore, the (i.j)-th entry of the time evolving, conditional (on X t ) covariance matrix of the 


random vector cr t (X t ) = 


^ J 


n T 


»Nx 


\ S t {9 (X t )) = E { <x t (X t ) (cr t (X t )) T | X t } 


£ 'Dj 2 C TS. NxN , with 'Dj 2 bounded, is defined as 


s, (9 (X t )) (■ i,j) 4 n ( Pi (t), Pj (t), 9 (X t )) , (5) 

V (i, j) £ x N^. For instance, for the heat kernel type of autocorrelation function employed 
in | |25| and proposed much earlier in |29|, defined as 

K (P* (t), Pj (t ), 9 (X t )) = n (dij (t ), 9 (X t )) = 0j (X t ) exp (’ (6) 

where d^ (t) = ||pj (t) — p j (f)|| 2 £ R+, we make the identifications 9 (X t ) = [0 l (X t ) 0 2 ( X t )] T 
with 9 1 (X t ) = Tj 2 (X t ). The first parameter, 0 S (X t ), called the shadowing power, controls 
the variance of the shadowing part of the channel, whereas the second, 9 1 ( X t ), called the 
correlation distance, controls the decay rate of the spatial correlation between the channels 
for each pair of network nodes. This simple isotropic model will be employed in the numerical 
simulations presented in Section V. 

In order to completely define an overall observation process for all nodes in the network, 
we may stack the N individual channel processes of (2], resulting in the vector additive 
observation model 

y t = ottli(X t ) + a t (X t ) + ^ t , \/t£N , (7) 

where rr t (X t ) is defined as above and y t £ R' V x 1 , a f £ R ;V x 1 and £ M Wxl are defined 
accordingly. The observation process (7] can also be rewritten in the canonical form y t = 
a t »(X t ) + y/CTUQ u t ,\/t £ N, where u t = u t (w) constitutes a standard Gaussian white 
noise process and C t (X t ) = S t (9 ( X t )) + 't|I ; v x R Rc■ with T> c obviously bounded. 

Let us now concentrate more on the time evolving underlying channel state process X t £ 
R ' , y 1 . In this work, we will assume that X t constitutes a Markov process with known but non¬ 
linear and (possibly) nonstationary dynamics, described by a possibly nonstationary stochastic 
kerne^A^ : SS ^R: A/x 1 j x R A,x 1 [0,1], Also, we will make the generic and realistic assump¬ 

tion that the state is confined to a compact strict subset of M Mx1 , that is, Vt £ N, X t £ [a, b] A1 = 
Z C R A/x1 , almost surely. Depending on the available information, instead of using stochastic 
kernels, we may alternatively assume the existence of an explicit state transition model ex¬ 
pressing the temporal evolution of the state, defined as X t = f t (X t _ t , W t ) £ Z,\/t £ N, where, 
for each t, f t : Z x W H>' Z constitutes a measurable nonlinear state transition mapping with 


throughout the paper, we use the intuitive notation K. f (A |X t _ 1 (uj) x ) 


= K, t (A\ x), for A Borel. 
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somewhat “favorable” analytical behavior (see below) and W t = W t (oj) £ W C R' Iw x 1 , for 
t 6 N, w £ (1, denotes a (discrete time) white noise process with known measure and state 
space W. 

From now on, In order to facilitate the presentation and without loss of generality, we will 
drop the subscript “t” both In the stochastic kernels and transition mappings governing X t , 
therefore assuming stationarity of the state. Further, for mathematical simplicity, although 
there are endless possibilities for defining the state dependent functions fj, (X t ) and 9 (X t ), 
we will assume that p ( X t ) = X t (1) £ R. and 9 ( X t ) = [X t (2) ... X t (M)] T £ jjR-f- 1 )* 1 , also 
in agreement with our intuition. From the previous discussion, it follows that the partially 
observable system defined above constitutes a HMM and can be equivalently described by the 
system of stochastic difference equations 

I X t | X t _! ~ K. (X t £ dx \X t _!) or 

\x t = f{X t _ u W t ) , Vf £ N, (8) 

y t = A t X t + cr t (. X t ) + 

where A t = [cc t Ojv x (m-i)] R M. NxM . In addition to the above and in favor of supporting 
our analytical arguments presented in subsequent sections, we make the following mild as¬ 
sumptions on the functional structure of the observation process of the HMM described by 
11 - 


Assumption 1: (Continuity & Expansiveness) All members of the functional family 
{Yi f : Z i—i IteN are elementwise uniformly Lipschitz continuous, that is, there exists some 


universal and bounded constant A' s £ R + , such that, Vf £ N and V (i, j) £ Nfy x NJj, 


S; j (x)-S?(y) 


< ||a: - y | 


(9) 


V (x, y) £ Z x Z. If * is substituted by the stochastic process X t (w), then all the above 
statements continue to hold almost surely. Also, it is true that 


A inf = inf inf X min (C t (*)) > 1, (10) 

1 tGN xGZ 

a requirement which can always be satisfied by appropriate normalization of the observations. 

For later reference (Section V), let us note that the isotropic autocorrelation kernel previously 
defined by (6] can be very easily verified to satisfy the Lipschitz condition of Assumption 1, 
simply considering the compactness of the state vector. 

Remark 1. The assumption of X t satisfying the (first order) Markov property does not offer 
only analytical tractability, but also practical feasibility. For example, statistical inference in 
higher order HMMs suffer from the curse of dimensionality so much that, most of the times, 
the computational effort required for the implementation of basic state estimators is absolutely 
prohibitive. On the other hand, our proposed formulation is based on general nonlinear models 
for describing the statistical behavior of the state, offering far greater flexibility, as well as 
modeling precision, compared to classical linear difference equations. From another point of 
view, it is well known that a tremendous amount of real world dynamical systems can be 
modeled using Markov processes and that, in cases where this is not entirely true, Markov 
processes usually constitute very good modeling approximations. 

Let us now define the problems of interest in this paper in a mathematically precise way. 
Hereafter, strict optimality will be meant to be in the Minimum Mean Square Sense (MMSE). 
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Also, in the following, the natural filtration generated by the causal observation process y t is 
defined as 

{^}*eN= t 11 ) 

where a {A} denotes the cr-algebra generated by the random element A. 

Problem 1. (Sequential Channel State Tracking (SCST)) Develop a theoretically grounded, 
sequential scheme for (approximately) evaluating the strictly optimal filter or p-step predictor of 
the channel state X t on the basis of the available channel magnitude observations up to time 
t, given by 

X t+p ±E{X t+p \& t }, VieN, (12) 

where p > 0 constitutes the prediction horizon. The computational complexity of the sequential 
scheme may not grow as more observations become available. 

Problem 2. (Sequential Spatiotemporal Channel Prediction (SSCP)) Develop a theoretically 
grounded, sequential scheme for (approximately) evaluating the strictly optimal spatiotemporal 
predictor of the channel magnitude at position q£ I 2 and time t + p (p > 0 is the prediction 
horizon) given the available channel magnitude observations up to time t, expressed as 

Vt+p (q) = E {y t+p (q)| , VfeN. (13) 

Again, the computational complexity of the sequential scheme may not grow as more observa¬ 
tions become available. 

Remark 2. The SSCP problem is clearly related to the channel predictability framework of 
|25) . In fact, in this paper, we use almost the same channel description (observation process). 
However, our proposed framework is philosophically different and potentially more general 
than that proposed in |25| , since the underlying channel dynamics (the channel state) are time 
varying and the considered estimation and prediction problems are formulated in a Bayesian 
sense. 

As we will see in later in Section IV, the SSCP problem can be solved sequentially using the 
respective sequential solution of the SCST problem. However, unfortunately, it is well known 
that, except for some very special cases such as those where the state process X t satisfies 
a linear recursion or where it constitutes a Markov chain (discrete state space) |30) - |33| , the 
respective nonlinear filtering and prediction problems do not admit any known sequential (in 
particular, recursive) representation |26| , |34| . Therefore, in order to solve the SCST problem 
defined above, one typically has to rely on carefully designed and robust approximations to 
the problem of nonlinear filtering of Markov processes in discrete time, focusing on the class 
of systems (HMMs) described by (5). This is exactly the subject of the next section. 


III. Asymptotically Optimal Recursive Filtering & Prediction of Markov 
Processes: Prior Results & Preliminaries 

In the following, we present a number of important results in asymptotically optimal, ap¬ 
proximate recursive filtering of Markov processes, recently presented in |35| . These results will 
provide us with the required mathematical tools for attacking the SCST and SSCP problems, 
defined previously in Section II. 
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A. Uniform State Quantizations 

From Section II, we have assumed that X t £ Z = [a, b M , Vi £ N, a.s., where, geometrically, 
Z constitutes an M-hypereube, representing the compact set of support of the state X t . Let 
us discretize Z into Lg = L 1 hypercubie M-dimensional cells of identical volume (each 
dimension is partitioned in to L intervals). The center of mass of the Z-th cell is denoted 
as x l Ls ,l £ Nj s . Then, letting X Ls = |®l s } + ■ the quantizer Q Ls : (Z,3$(Z)) ^ 

(x Ls ,2 Xl sJ is defined as the bijective and measurable function which uniquely maps the 
Z-th cell to the respective reconstruction point x Ls , VZ £ N, according to some predefined 
ordering. That is, Q Ls (x) = x l Ls if and only if x belongs to the respective cell (for a detailed 
and more formalistic definition, see 135). Having defined the quantizer Ql s (■)> we consider 
the following discrete state space approximations of the process X t |36|: 

• The Markovian Quantization of the state, defined as 

X T t S = Ql s (/ (xL\ ,W t ))ex Ls , Vt € N. (14) 

where we have assumed explicitly apriori knowledge of a transition mapping, modeling 
the temporal evolution of the Markov process X t , and 

• The Marginal Quantization of the state, defined as 

Xt s = Q Ls (x t ) e x Ls , vte N. (15) 

Additionally, for later reference, define the column stochastic matrices P £ [0, l] LsXis and 
P € [0, l] isXis as 

P(i,j) = v(xl; s = xl s xi\ = x { s ) and (16) 

P(iJ) 4 v {x\ s = x\ s Xt\ = xQ , (17) 

\/(i,j) £ N7 x , obviously related to the Markovian and marginal state quantizations, 

s s ~ 

respectively. Due to its structure, P can at least be constructed simulating X t s . From the 
Law of Large Numbers, the entries of P can be estimated with arbitrary precision from a 
sufficiently large number of realizations of X^ s ,t £ N^, for some T < oo. Similarly, P can be 
estimated also with arbitrary precision from multiple realizations of X t s , which constitute a 
deterministic functional of the true state X t . Note, however, that in this case, it is possible 
to obtain P only using available realizations of the state, without actually knowing either the 
stochastic kernel or the transition mapping of X t (if such exists). For example, this could be 
made possible in sufficiently controlled physical experiments, specially designed for system 
identification, where the state X t wou Id be a fully observable stochastic process. 

B. Asymptotically Optimal Recursive Estimators 

First, let us introduce the concept of conditional regularity of stochastic kernels and stochas¬ 
tic processes, which, as we will see below, plays an important role in the asymptotic consistency 
of a special class of approximate state estimators, based on the marginal state quantization 
discussed above. 
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Definition 1. (Conditional Regularity of Stochastic Kernels 


(or Markov) kernel JC : 


(* 


,Mx 1 




,#x 1 


Consider the stochastic 

Mxl 


H >■ [0, 1], associated with the process Y t (oj) £ 


for all t £ N. We say that AC (■ | •) is Conditionally Regular of Type I (CRT I), if, for almost all 


x £ Z, there exists a bounded sequence S n (x) £ R + , n £ N + such that, for A £ j Z J L ) 


16N+, 


sup \K{A\x)-K(A\Q L s {x)) \ < 


h s (*) 


with 


lim S L (x) = 0, a.e. 

L o —^oo 


(18) 


s~ 
,Mx 1 


If, additionally, for almost all x £ R J “ A 1 , the Borel probability measure AC (-1 x) admits a 
stochastic kernel density n : R Mxl x M A/x1 i—>• [0,1] , suggestively denoted as k ( y x) and if 
the condition 


sup \n{y\x) - n(y\Q Ls (x))\ < 5 Ls (x) 


y &R 


with 


lim S L (x) = 0, a.e. 
L o —^oo 


(19) 


is satisfied, then we say that AC (-| •) is Conditionally Regular of Type II (CRT II). In any of the 
two cases, we will also say that Y t is conditionally regular, interchangeably. 


We are now ready to present the following two central results, establishing that, under 
Assumption 1 and certain but mild assumptions on the nature of the state process X t , it 
is possible to approximate the strictly optimal nonlinear filter/predictor X t+p by a simple 
recursive filtering scheme, being formally similar to the MMSE optimal filter of a Markov chain 
with finite state space. The resulting approximate filter is strongly theoretically consistent, 
in the sense that it converges to the true optimal filter of the state uniformly inside each 
fixed finite time interval and uniformly in a measurable set consisting of possible outcomes 
occurring with probability almost 1. 

The results are presented below. The proofs are omitted, since each one of them essentially 
constitutes a fusion of several related results recently presented by the authors in | |35) ■ In the 
following, Ql s : (x Ls ,2 Xl s\ i—>• {B Ijs . 2 e, 's j constitutes a unique brjcctivc mapping between 

the sets X Ls and B; . = | e f S ) + > where the latter contains as elements the complete 
standard basis in R isXl . 


Theorem 1. (Approximate Filtering of Markov Processes |35j) Define the reconstruction 
and likelihood matrices as 


X4 


p M x Lg 


A A 


diag("{A t (*i s )} 


jeftL 


and 

f.qXLc 


( 20 ) 

( 21 ) 


respectively, where, for all t £ N. X t j is given by (24) (top of next page). Then, the strictly 
optimal filter and p-step predictor of the state process X t can be approximated as 


S Ls (X , 


t+p 


& t ) = , Vf £ N 


I E, 


till 


( 22 ) 
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\Ax 3 r ) = 


exp ( y * AtX ° L s) ( St K) + ^nxn) (y t 


— A,x- 


t-^Lc 


det 




(24) 


2t 

a t L NxN 


and for all finite prediction horizons p > 0, where the process E t € K isXl on the RHS of |22| 
satisfies the simple linear recursion 

E t = A t PE t _ u ft e N (23) 


and where 

• P = P, for the Markovian quantization of the state and 

• P = P, for the marginal quantization of the state. 

Pie approximate filter £ Ls (X t \'3f) is initialized at time t = —1 setting E_ x = E j Qf s ^X^fJ | 

for the Markovian quantization and E_-\ = E |Ql s } -f or ^ le mar ff na -l quantization of 

the state. 


Theorem 2. (Asymptotic Optimality of Approximate Filters | 35 |) Pick any natural T < oo 
and suppose either of the following: 

• The Markovian quantization is employed, whose initial value coincides with that of X t , and 
the transition mapping of the state, f : Z x W Z, is Lipschitz in Z, for every element 
of W. 

• The marginal quantization is employed and X t is conditionally regular. 

Then, for any finite prediction horizon p > 0, there exists a measurable subset V. T C (2 with 
V-measure at least 1 — (T + l) 1_CAr exp ( —CN), such that 


sup sup 

tew T T 


£ Ls (X t+p \& t )-X t+p 


1 


L q —Y oo 


(25) 


for any free, finite constant C > 1. In other words, the convergence of the respective approximate 
filters is compact in t £ N and, with probability at least 1 — (T + i) 1-CAr ex p (— C'iV), unform 
in u>. 

Note that while the approximate filters/predictors described in Theorem [T] are structurally 
very simple, they converge to the respective optimal nonlinear state estimators in a particularly 
strong sense (under the respective conditions), as Theorem [2] clearly suggests. 


IV. SCST & SSCP in Mobile Wireless Networks 

In this section, we present the main results of the paper. In a nutshell, we propose two 
theoretically consistent sequential algorithms for approximately solving the SCST and SSCP 
problems defined in Section II.B, both derived as applications of Theorem [T] presented in 
Section III. 
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Algorithm 1 Sequential Channel State Tracking 

1) Choose P and E_ 1 depending on the type of state quantization employed (Markovian or 
marginal). 

2) Choose p > 0 and recall P and XP p from memory. 

3) For t = 0,1, ... do 

4) Compute the diagonal matrix A t from |24) . 

5) Compute & store 

Et = A tPE t -i, 

until the next iteration. 

6) Normalize E t as 

f n = E t 

* ii^nr 

7) Compute & output E t as 

£ Ls (X t+p \ & t )=XP p E?. 

8 ) endFor 


A. SCST 

At this point, it is apparent that Theorem [T] in fact directly provides us with an effective 
approximate and recursive estimator for the channel state X t . Therefore, Theorem [T] imme¬ 
diately solves the SCST problem, since the resulting filtering/prediction scheme is sequential 
and, as new channel measurements become available, its computational complexity is fixed, 
due to time invariance of the type of numerical operations required for each filter update. 

Specifically, Algorithm 1 shows the discrete steps required for the centralized implemen¬ 
tation of the proposed filtering scheme, in a relatively powerful fusion center. Observe that, 
depending on the type of quantization employed and for each p > 0, the required matrices P 
and XP p can be computed offline and stored in memory. 


B. SSCP 

Defining the natural filtration generated by both the state X t and the observations y t as 

= I ' 7 

and using the tower property of expectations, it is true that 

Vt{ q) = E{y t (q)|^} 

= E{E{y t (q)\Jif t }\^ t }, Vi€ N. (27) 

Let us define the quantities 

n q 4 -101og 10 (||q — P re /|| 2 ) , (28) 

a q (X t ) \X t ~ f AT(0,T, 2 (X t )) and (29) 


( 30 ) 
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where, also by definition, 

E 


(x t ) 

^ q ( X t ) 


'o-t (X t ) 

^{X t ) 

T)Nx 1 



(X t ) <x q (X t ) 

(°’ q (x t )) T n 2 {x t ) J 


with each element of <x q (X t ) £ R" given by 

(X t ) (j) = X (q, Pj c t ), 0 (XJ) , Vj e K . 

Then, it must be true that 

Ut (q) — a<i x t (i) + cr q (A' t ) + £ q 

= A q X t + CT q (X t ) + £ q , 


(31) 


(32) 


(33) 


where /l q = [a q 0 lx ( M _ 1 - ) ] £ K lxM , since y t (q) can be equivalently considered as an addi¬ 
tional observation, measured by an imaginary sensor at position q, which of course was not 
used for state estimation in the SCST problem treated above. Under these considerations, the 
inner conditional expectation of (27) can be expressed as 


(34) 


(q)\J%) = E{ A«X t + <r q (X t ) + f 1 \^t} 
= A q X ( +E{<7 q (X t )|^|, 


or, using well known properties of jointly Gaussian random vectors |37| | (also used in 

E {y t (q) I = A q X t + (<x q (X t )f Cr 1 (X t ) (y t - A t X t ) 4 <j, t (X u y t ). 

As a result, y t (q) can be expressed as 

y t (q) = £{</>, {X t ,y t )\& t }, 


(35) 


(36) 


that is, the SSCP problem coincides with the problem of sequentially evaluating the optimal 
nonlinear filter of a particular functional of the state and the observations, (f> t (•,•). In this 
respect, the following result is true, which, together with the results presented in Section III, 
has also been formulated previously by the authors in [35|. 


Theorem 3. (Approximate Filtering for Functionals of the State / Separation Theorem 


35 


For any deterministic functional family j <p t : 


l LsXl H> ]R M A x1 


with bounded and 


t gn 


continuous members and any finite prediction horizon p > 0, the strictly optimal filter and 
p-step predictor of the transformed process <fi f (X t ) can be approximated as 

P p E f 


g L S 


(0 t+P (* t+P )|3Q=* 


t+p] 


xl 


E, 


(37) 


till 


for all t £ N, where the process E t £ M tsXl can be recursively evaluated as in Theorem[j] and 


, A 

^t+p 


4>t +P (*i s ) ■■■ ^t+p(*Ls) 


d x L g 


(38) 


In the above, the transition matrix P and the initialization of the approximate filter are exactly 
the same as in Theorem Q] Additionally, the approximate filter is asymptotically optimal under 
the same conditions and in the same sense as in Theorem^ 
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Algorithm 2 Sequential Spatiotemporal Channel Prediction 

1) Choose P and E_ 1 depending on the type of state quantization employed (Markovian or 
marginal). 

2) Choose p > 0 and recall P and XP p from memory. 

3) Choose an arbitrary qgK 2 . 

4) For t = 0,1, ... do 

5) Compute the diagonal matrix A t from |24) . 

6) Compute 

Et = A t PE t _i, 


7) 

8 ) 
9) 


& store it until the next iteration. 

Normalize E t as 

f n E t 

4 ii^tiir 

Compute vector < p t (y t ) using current 

from |35) and |40| . 

Compute & output £ Ls ( yt+ p (q)| as 


£ Ls ( Vt+p (q)| %) = 


((& (y t ),E t N ), 

1 A q XP p E t N , 


P = o 

P> 1 


10) endFor 


observations 


Invoking Theorem [3] the following result is true, providing a closed form approximate so¬ 
lution to the SSCP problem, at the same time enjoying asymptotic optimality in the sense of 
Theorem [2j 


Theorem 4. (Approximate Solution to the SSCP Problem) The strictly optimal spatiotemporal 
predictor of the channel magnitude at an arbitrary position qgK 2 , y t (q), can be approximated 
as 


£ Ls (y t+P (q)| ®t) = 


4>t (y t) 



A q XP p 



P = 0 
p> 1 


(39) 


for all t C N, where the process E t € R 2 ' s x 1 can be recursively evaluated as in Theorem[7| and 
where the stochastic process (f) t (y t ) £ R isXl is defined as 


<t> t (yt) = 



(40) 


with 0 t : K Mx1 x R A ' x 1 i—>• R defined as in |35) . In the above, the transition matrix P and the 
initialization of the approximate filter are exactly the same as in Theorem |T[ Additionally, under 
the same conditions as in Theorem [2| it is true that 


sup sup 

ieN T w gO T 


£ Ls ( Ut+p (q) | ®t) 


Vt+p (q) 


L cj —y oo 


0. 


(41) 


Proof of Theorem See Appendix. 
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Algorithm 2 shows the discrete steps required for the centralized implementation of the 
proposed approximate spatial prediction scheme. 

C. Computational Complexity: A Fair Comparison 

A careful Inspection of Algorithms 1 and 2 proposed earlier for the solution of the SCST and 
SSCP problems, respectively, reveals that, in the worst case, the computational complexity of 
both algorithms scales as O (l's + L^N 3 j . The two algorithms can also be combined into one 
with the same computational requirements. The cubic term related to the number of sensors 
In the network is due to the Inversion and the determinant calculation of the covariance 
matrices, for each reconstruction point x° Ls ,j £ Nj s and at each time instant t £ N, and it is 
computationally bearable, at least for a relatively small number of sensors. However, note that 
when the sensors are stationary or even when their trajectories are fixed and known apriori, 
the aforementioned computationally demanding operations may be completely bypassed by 
precomputing the required matrices for each set of parameters and storing them in memory. 
In such a case, the computational complexity of both algorithms reduces significantly to 

o(l 2 s + l s n 2 ). 

Temporarily considering the number of sensors N as constant and focusing solely on the 
number of quantization regions L s , it is apparent that the complexity of both algorithms 
considered scales as O (^s'j ■ This can be very large if one considers the same quantization 
resolution in each dimension of the Euclidean space the state process lives in, that is, as 
in Section III .A, where, for simplicity, we considered a completely uniform strictly hypercubic 
quantizer on the set [a, b] AI . Specifically, in this general case, where we “pay the same attention” 
to all points in the M-hypercube of interest, the overall complexity scales as O = L 2AI ^, 
which, of course, is clearly prohibitively large for high dimensional systems. However, as 
analyzed in |35| , no one prevents us from considering either hyperrectangular Euclidean state 
spaces since, in most cases, each element of the state vector would have its own dynamic 
range, or different quantization resolution for each element, since they may not have all the 
same importance in the particular engineering application of interest, or both. 

In the particular problems we are interested in here, the dimension of the state process 
is relatively low, that is, the range of M almost always between 2 and 5 dimensions, which 
makes grid based filters practically feasible. Additionally, as we will see in Section V, where 
we present the relevant numerical simulations and as it has already been clearly shown in 
|25] in a non Bayesian framework, if one considers the simple isotropic autocorrelation kernel 
described by (6] in Section II.B and is primarily interested in the SSCP problem, the sensitivity 
of the quality of spatial channel prediction on the estimation error of the shadowing power 
and decorrelation distance of the channel is indeed very low, making it possible to potentially 
consider a lower quantization resolution for the aforementioned quantities without significant 
compromise in terms of the prediction quality. As a result, grid based approximate filters are 
indeed adequate for the problems of interest in this paper, taking advantage of their strong 
properties in terms of asymptotic consistency. 

Naturally, particle filters |38| , |39| would constitute the rivals of our grid based filtering ap¬ 
proach. Compared to the former, particle filters exhibit a computational complexity of O (Lg), 
where, in this case, L s constitutes the number of particles f.38| . That is, the complexity of 
particle filters is one order of magnitude smaller compared to the complexity of grid based 
filters. Note, though, that in Algorithms 1 and 2, the one and only computational operation 
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which incurs a complexity of O y^sj is the matrix vector operation PE t _ 1 . In fact, in the 
numerical simulations conducted in Section V, it was revealed that, at least for the problems 
of interest, the numerical operations of inversion and determinant computation of the re¬ 
spective covariance matrices are far more computationally important than the aforementioned 
matrix vector multiplication. These operations would be also required in a typical particle filter 
implementation as well. 

Continuing the comparison of the proposed approach with the filtering approximation family 
of particle filters, another issue of major importance is filter behavior with respect to the curse 
of dimensionality. Although in, for instance, |40] and )4l| , it was warmly asserted that the 
use of particle filters might indeed make it possible to beat the curse of dimensionality, it was 
later made clear that this is not the case and that particle filters indeed suffer from the curse 
|42) -| 44 . More recently, it was shown that particle filters suffer in general both in terms of 
temporal uniformity in the convergence of the respective filtering approximations and in terms 
of exponential dependence on the dimensionality of the observation process |45| and that of the 
state process as well [44) , greatly affecting rate of convergence. In fact, as also shown in [45| , in 
order to somewhat circumvent these important practical limitations, strict assumptions must 
hold regarding the structure of the partially observable system under consideration, therefore 
somewhat limiting the general applicability of the respective methods presented in |45| . Of 
course, the grid-based filters proposed in this paper suffer from similar drawbacks |35] , a fact 
that strengthens the common belief that, at least in the context of nonlinear filtering, the curse 
of dimensionality constitutes a ubiquitous phenomenon. 

However, from a technical point of view, the grid based filters we propose for effectively 
solving the SCST and SSCP problems are very consistent, in the sense that their convergence 
to the true nonlinear filter of the state is compact in time and uniform with respect to a set of 
possible outcomes of almost full probability measure. Although, due to our inability to show 
uniform convergence in time (however keeping the class of admissible hidden Markov process 
large), we cannot theoretically prove that the proposed approximate filters can indeed reach a 
stable steady state, we can at least guarantee that the approximation error will be uniformly 
bounded for any fixed time interval set by the user, with overwhelmingly high probability 
(for more details, see |46| and |35| |). Further, as we will see in the next section, the practical 
performance of the filters, at least when considering the autocorrelation kernel of (6], is very 
robust and tracks the hidden system accurately, for a relatively small number of quantization 
cells, without the need of any fine tuning, as opposed to the case of particle filters (choice of 
importance density, etc.) |38|. 


V. Numerical Simulations 

The practical effectiveness of Algorithms 1 and 2 will also be validated through a number 
of sufficiently representative synthetic experiments. Specifically, we consider TV = 30 sensors 
randomly scattered on a sufficiently fine square grid, in the square region of the x/y plane 
S = [0, 40] 2 (in m x to). The position of the reference antenna is fixed at p re f = [2510] T . 
Concerning the behavior of the communication channel throughout the plane, the variance of 
the multipath fading noise term is set at = 2 and, as far as shadowing is concerned, the 
autocorrelation kernel of {6] is employed, where, for simplicity, we assume that the correlation 
distance is known and constant with respect to time and equal to 10 to. As a result, in this 
simple example, the channel state is two dimensional, with X t (1) = // (X t ) and X t (2) = 0 (X t ) 
representing the path loss coefficient and the shadowing power, respectively. The temporal 
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Nonlinear Filtering of Xt{ 1) 



Nonlinear Filtering of Xt( 2) 




(a) 


Spatial Channel Prediction at time f 




Spatial Channel Prediction at time t + 1 




Figure 2: (a) Demonstration of channel state tracking for 250 time steps. The estimates 
are produced from the observations of 30 randomly scattered sensors In the square region 
[—20 m, 20 m] 2 . (b) Spatial prediction and temporal tracking of the channel combined. In this 
example, the spatial grid consists of 3600 evaluation points and the results are obtained using 
just 30 spatial measurements. Observe how the prediction procedure dynamically captures 
the basic characteristics of the channel, exploiting the spatial correlations due to shadowing. 
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evolution of each respective component of the channel state is given by the stochastic difference 
equations 

X t (l)=tanh(7(X t _ 1 (l)-2)) + W t + 2, and (42) 

X t (2) = 0.3 |tanh (sin ('yX^ (2) W t ) + 

+X t _i (2) W t ) + W t \ + 25, VieN, (43) 

for some arbitrary but known initial conditions, where 7 = 1.6 and W t = clipj , ^ (G t ), 

C t : 'X]' J\f (0,1), with olij)r j -|| (•) denoting the hard limiter operation into the set [—1,1]. Note 

that both difference equations are strongly nonlinear in in both the state and driving noise. 
Also note the strong coupling between the two equations, due to the fact that both are driven 
by exactly the same noise realizations. The above equations attempt to model a situation where 
the path loss exponent is somewhat slowly varying between 0 and 4, whereas the shadowing 
power is rapidly varying between 25 and 25.6. The state X t = \X t (1) X t (2)] T was uniformly 
quantized into Lg = 30“ cells (that is, L = 30). For simplicity, regarding the simulation results 
which will be presented and discussed below, we focus on the case where p = 0 , that is, we 
consider the problems of temporal filtering of the channel state and spatial prediction of the 
channel which both constitute instances of the SCST and SSCP problems, respectively. 

Fig. |2a| demonstrates the channel state tracking (temporal filtering of the state) for 250 time 
steps, according to the experimental setting stated above. As illustrated in the figure, the 
quality of the estimates is very good, considering the nonlinearity present both in the state 
process and the observations at each sensor in the network. It also apparent that the produced 
estimation process behaves in a stable manner, as time increases. 

The filter of the channel state can subsequently be used for the also asymptotically optimal 
prediction of the channel magnitude in the rest of the space. This is illustrated in Fig. |2b| where 
the combined spatial prediction and temporal tracking of the channel magnitudes are shown 
(for two time instants), in comparison to the real channel maps in the square region under 
consideration. The random field used for modeling the spatial channel process was generated 
using a spatial grid of 3600 points and the respective predicted values were obtained from 
just N = 30 random scattered spatial channel measurements in the region of interest. From 
the figure, it can be seen that the quality of the predicted process is very good, especially 
considering the fact that the channel is reconstructed using only 0.83 % of the total number of 
grid points in the region of interest. Of course, the quality of the spatial prediction improves as 
the number of spatial measurements (and therefore nodes/sensors) increases. One can observe 
that the prediction procedure accurately captures the basic characteristics of the channel in 
the region of interest, effectively exploiting the spatial correlations due to shadowing. 

VI. Conclusion 

In this paper, a nonlinear filtering framework was proposed for addressing the fundamental 
problems of sequential channel state tracking and spatiotemporal channel prediction in mobile 
wireless sensor networks. First, we formulated the channel observations at each sensor as a 
partially observable nonlinear system with temporally varying state and spatiotemporally vary¬ 
ing observations. Then, a grid based approximate filtering scheme was employed for accurately 
tracking the temporal variation of the channel state, based on which we proposed a recursive 
spatiotemporal channel gain predictor, providing real time sequential CG map estimation at 
each sensor in the network. Further, we showed that both estimators are asymptotically 
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optimal, in the sense that they converge to the optimal MMSE estimators/predictors of the 
channel state and observations at unobserved positions in the region of interest, respectively, 
in a technically strong sense. In addition to these theoretical results, numerical simulations 
were presented, validating the practical elfectiveness of the proposed approach and increasing 
the user’s confidence for practical consideration in real world wireless networks. 

Appendix 

Proof of Theorem [4] 

Let us first consider the filtering case, that is, the one where p = 0. Substituting |35) to 
(36], we get 


Vt (q) = E {A«x t + (<x q ( X t )f C,- 1 (X t ) (y t - A t X t )\ 

= Vl q E { X t | + E { (<x q (X t )f C,- 1 (X t ) 

-E{(cr?(X t )) T C7 1 (X t )A t X t \& t }, Vt G N, (44) 
from where, defining the bounded and continuous functionals 

0t (X t ) = ((a q {X t )) T C^ 1 {X t )j 


G R Nx1 and 


& (Xt) = (<T? (X t )f cr 1 (x t ) A t X t G 


(45) 

(46) 


we can write 


inequality, it is true that 


Vt (q) = ^{yt (q)l = A q E { X t | %} + (e { <f>\ (X t )| ^}) y t - E { </>* (X t )| . (47) 

Then, for all t G N, define the approximate operator 

£ Ls (Vt(q)l^) = A q £ Ls (X t \9Q+ (s- Ls ( cj>\ (X t )| )' T y t - S L * ( (X t )| . (48) 

Let us study |48) in terms of its potential asymptotic optimality properties. Using the triangle 


< 


< 


(vt (q)l^t) -%( q) 


(s Ls (X t \& t )-E{X t \& t } 

(e Ls (X t )\& t ) -E{^(A,)|^}) J y t 


+ 


g L S 


(<t>t (* t )| -e{^ (X t )|^} 


(49) 


Also, from the Cauchy-Schwarz Inequality and the fact that the L 2 norm of a vector is upper 
bounded by its L 1 norm, 


£ Ls (y t (q)l^t) -yi(q)| < l« q l |(f is (x t \& t ) -E{x t |^}) 


+ lly* II2 


g L S 


(X t )\gf) - E{0 t 1 (X t )|^}|| i + |f Ls ( 0 ? (X t )|^) -E { 0 ? (X t )\& t ] 


( 50 ) 
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Now, from ((46), Lemma 7), it follows that for any natural T < oo, there exists a bounded 
constant 7 > 1 , such that 


sup ||y t H|| 2 < \ZjCN (1 + log (T + 1)), 


teN T 


for all w € f l T C O, with measure at least 


1 - 


exp (-CN) 
sCN-l ’ 


(51) 


(52) 


(T + 1) 

exactly as in Theorem [2] Therefore, directly invoking Theorems [2] and [3] it readily follows that, 
under the respective conditions, 


lim sup sup 


L q —^OO 




= o, 


(53) 


showing the second part of the theorem, when p , the prediction horizon, coincides with zero. 
For the first part, observe that the approximate predictor E Ls {y t (q)\fV t ) can be explicitly 
expressed as (see Theorems [T] and [3] 

in / m \ r 

f is (y t (q)l^) = ^ q x T 


E t 


WE, 


t 111 


‘K Hi 


= ^X + yT$l + (4>t) 

= ^(A q X) T +($ f 1 ) T y t + 0 


A? E * 

yt \ 0 t ’ll^lli 

E f 


E, 


till 

E t 


E, 


■till 


(54) 


which, after simple algebra, can be easily shown to coincide with the vector process </> t (y t ), 
present in the statement of Theorem [4] 

In the prediction case, that is, when p > 1, the procedure is slightly different. Let us first 
define the complete filtration generated by X t+p ,y t and y t (q) as 




(55) 


Also, note that, for all p > 1, the augmented observation vector process (and therefore each 
one of its elements) 


aug A 

y t+p — 


y t~\~p 

vt+p (q) 


xl 


(56) 


is conditionally independent of y"" 9 . y““f■ • ■ given the state at time t + p, X t+p . Thus, using 
the tower property, it is true that 

yt+p( q) = ^{y t+P (q)| %} 


= E 


or, equivalently, 


{E{y t+ p(q)l^ +P } 

= E { E { yt+p (q) | X t+p } 

= E { E { A\X t+p + (X t+p ) + | X t+p } | & t } 

= E{A«X t+p \& t }, 

y t+p (q) = A^E{X t+p \^ t ). 


(57) 

(58) 
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Consequently, defining the approximate spatiotemporal predictor 

£■ Ls ( y t+P (q) | & t ) = ^£ Ls ( x t+p | & t ), (59) 

substituting £ Ls ( X t _ p from Theorem |T| and following a very similar convergence analysis 

to the filtering case treated above, the respective results present in the statement of Theorem 
[4] follow. The proof is complete. ■ 
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